Skip to content

Align target sequence to selected transcript on protein level#79

Open
sallybg wants to merge 2 commits intomavedb-devfrom
align-target-to-protein
Open

Align target sequence to selected transcript on protein level#79
sallybg wants to merge 2 commits intomavedb-devfrom
align-target-to-protein

Conversation

@sallybg
Copy link
Collaborator

@sallybg sallybg commented Mar 11, 2026

For protein-coding, non-reference variants, align protein-level target sequence to protein-level selected transcript sequence, and use alignment information to generate mappings. This handles within-sequence offsets between the target and selected transcript, which was not handled previously.

@sallybg sallybg requested a review from bencap March 11, 2026 17:38
@bencap bencap linked an issue Mar 11, 2026 that may be closed by this pull request
Copy link
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sally, looks like a really nice improvement to the mapping logic and it's resolving the cases we identified soundly. There's just one potential issue we should resolve before merging.

Comment on lines 189 to 190
For protein annotations, these strings must be adjusted to match the offset defined by the start of the
transcript sequence. For genomic annotations, these strings must be adjusted to match the coordinates of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should adjust this documentation to match the new logic of aligned offsets.

Comment on lines +794 to +796
protein_align_result = align_target_to_protein(
sequence, transcript.sequence, silent
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a type guard here isinstance(transcript, TxSelectError) like we have before accessing any properties of transcript in the loop below. If we attempt to access transcript.sequence while the transcript is of type TxSelectError, we'll get an AttributeError.

I think it's enough to just guard it and then continue to the loop if it is of the wrong error type, so that we get per-row handling of error messages.

Suggested change
protein_align_result = align_target_to_protein(
sequence, transcript.sequence, silent
)
protein_align_result = None
if isinstance(transcript, TxSelectResult):
protein_align_result = align_target_to_protein(
sequence, transcript.sequence, silent
)

@@ -767,7 +814,7 @@
else:
if _hgvs_pro_is_valid(row.hgvs_pro):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the comment preceding this one, we'd I think also want to guard against a NoneType protein_align_result here. I'm not convinced that can actually occur in practice (Often in this code base things are typed as Unions, but the types are heavily correlated such that if one object is type X, another must be type Y and not Z) , but it's a good guard to have.

I think transcript is fine typed as is given my notion on the correlated types and since it is pre-exists this change.

@bencap bencap changed the base branch from mavedb-main to mavedb-dev March 12, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align target sequence to selected transcript at the protein level

2 participants